Data Science Ethics Checklist 


Data Collection 


e A.1 Informed consent: If there are human subjects, have they given 
informed consent, where subjects affirmatively opt-in and have a clear 
understanding of the data uses to which they consent? 

e A.2 Collection bias: Have we considered sources of bias that could 
be introduced during data collection and survey design and taken steps 
to mitigate those? 

e A.3 Limit PII exposure: Have we considered ways to minimize 
exposure of personally identifiable information (PII) for example 
through anonymization or not collecting information that isn't relevant 
for analysis? 

e A.4 Downstream bias mitigation: Have we considered ways to 
enable testing downstream results for biased outcomes (e.g., collecting 


data on protected group status like race or gender)? 


Data Storage 


e B.1 Data security: Do we have a plan to protect and secure data 
(e.g., encryption at rest and in transit, access controls on internal users 
and third parties, access logs, and up-to-date software)? 

e B.2 Right to be forgotten: Do we have a mechanism through which 


an individual can request their personal information be removed? 


B.3 Data retention plan: Is there a schedule or plan to delete the 


data after it is no longer needed? 


Analysis 


C.1 Missing perspective: Have we sought to address blindspots in 
the analysis through engagement with relevant stakeholders (e.g., 
checking assumptions and discussing implications with affected 
communities and subject matter experts)? 

C.2 Dataset bias: Have we examined the data for possible sources of 
bias and taken steps to mitigate or address these biases (e.g., 
stereotype perpetuation, confirmation bias, imbalanced classes, or 
omitted confounding variables)? 

C.3 Honest representation: Are our visualizations, summary 
statistics, and reports designed to honestly represent the underlying 
data? 

C.4 Privacy in analysis: Have we ensured that data with PII are not 
used or displayed unless necessary for the analysis? 

C.5 Auditability: Is the process of generating the analysis well 


documented and reproducible if we discover issues in the future? 


Modeling 


e D.1 Proxy discrimination: Have we ensured that the model does not 


rely on variables or proxies for variables that are unfairly discriminatory? 


e D.2 Fairness across groups: Have we tested model results for fairness 


with respect to different affected groups (e.g., tested for disparate error 


rates)? 


D.3 Metric selection: Have we considered the effects of optimizing for 
our defined metrics and considered additional metrics? 

D.4 Explainability: Can we explain in understandable terms a decision 
the model made in cases where a justification is needed? 

D.5 Communicate bias: Have we communicated the shortcomings, 
limitations, and biases of the model to relevant stakeholders in ways that 


can be generally understood? 


Deployment 


E.1 Redress: Have we discussed with our organization a plan for a 
response if users are harmed by the results (e.g., how does the data science 
team evaluate these cases and update analysis and models to prevent 
future harm)? 

E.2 Rollback: Is there a way to turn off or roll back the model in 
production if necessary? 

E.3 Concept drift: Do we test and monitor for concept drift to ensure the 
model remains fair over time? 

E.4 Unintended use: Have we taken steps to identify and prevent 
unintended uses and abuse of the model and do we have a plan to monitor 


these once the model is deployed? 
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